Fast Label Extraction in the CDAWG
نویسندگان
چکیده
The compact directed acyclic word graph (CDAWG) of a string T of length n takes space proportional just to the number e of right extensions of the maximal repeats of T , and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which e grows significantly more slowly than n. We reduce from O(m log log n) to O(m) the time needed to count the number of occurrences of a pattern of length m, using an existing data structure that takes an amount of space proportional to the size of the CDAWG. This implies a reduction from O(m log log n+ occ) to O(m+ occ) in the time needed to locate all the occ occurrences of the pattern. We also reduce from O(k log log n) to O(k) the time needed to read the k characters of the label of an edge of the suffix tree of T , and we reduce from O(m log log n) to O(m) the time needed to compute the matching statistics between a query of length m and T , using an existing representation of the suffix tree based on the CDAWG. All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.
منابع مشابه
On-Line Construction of Compact Directed Acyclic Word Graphs
A Compact Directed Acyclic Word Graph (CDAWG) is a space–efficient text indexing structure, that can be used in several different string algorithms, especially in the analysis of biological sequences. In this paper, we present a new on–line algorithm for its construction, as well as the construction of a CDAWG for a set of strings.
متن کاملA Fast Localization and Feature Extraction Method Based on Wavelet Transform in Iris Recognition
With an increasing emphasis on security, automated personal identification based on biometrics has been receiving extensive attention. Iris recognition, as an emerging biometric recognition approach, is becoming a very active topic in both research and practical applications. In general, a typical iris recognition system includes iris imaging, iris liveness detection, and recognition. This rese...
متن کاملPage segmentation and classification using fast feature extraction and connectivity analysis
Page segmentation and classification are important parts of the document analysis process. The aim is to extract and classify different parts of the page. This paper proposes an approach in which these two phases are combined. The integration process includes fast feature extraction with rule-based classification and label propagation using connectivity analysis providing classified areas in th...
متن کاملA Heterologous Enzyme Linked Immunosorbant Assay of Morphine Using Penicillinase as Label
A rapid, sensitive, specific and high through-put enzyme-linked immunosorbant assay (ELISA) methodfor determination of morphine in urine samples using penicillinase as label enzyme has been developed. Noextraction or chromatography was included in this assay procedure. Immunoglobulin (Ig) purified polyclonalanti-bodies against a C6-hemisuccinate derivative of morphine (M-C6-HS...
متن کاملComparison of the Modified QuEChERS Method and the Conventional Method of Extraction in Forensic Medicine to Detect Methadone in Post-Mortem Urine by GCMS
Background:Extraction of drugs is one of the biggest concerns and the most important part of preparation and determination in forensic medicine. The lack of an easy, efficient and fast extraction method is the most important and most difficult problem despite the development of forensic centers and their being equipped with new diagnostic devices. In the present study, a comparison was conducte...
متن کامل